Research Methodologies, Observations and Outcomes in (Conversational) Speech Data Collection

نویسندگان

Christopher Cieri

David Miller

Kevin Walker

چکیده

This paper presents research methodologies for collecting speech data and gives observations from a recent set of conversational speech collections before describing their outcomes. The presentation begins with a comparison of the relative challenges offered by broadcast news, telephone conversation and meeting recordings. The remainder of the discussion focuses on methods for collection of conversational data with special focus on two recent Switchboard collections. We identify method that have allowed for very cost-efficient collection of Switchboard data. We conclude with a summary of generally available resources that result from the efforts described herein.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

From switchboard to fisher: telephone collection protocols, their uses and yields

This paper describes several methodologies for collecting conversational telephone speech (CTS) comparing their design, goals and yields. We trace the evolution of the Switchboard protocol including recent adaptations that have allowed for very cost-efficient data collection. We compare Switchboard to the CallHome and CallFriend protocols that have similarly produced CTS data for speech technol...

متن کامل

Ethnomethodology and Conversational Analysis

In a speech community, people utilize their communicative competence which they have acquired from their society as part of their distinctive sociolinguistic identity. They negotiate and share meanings, because they have commonsense knowledge about the world, and have universal practical reasoning. Their commonsense knowledge is embodied in their language. Thus, not only does social life depend...

متن کامل

Towards conversational speech synthesis; lessons learned from the expressive speech processing project

This paper discusses some ideas for the requirements and methods of conversational speech synthesis, based on experience gained from the collection and analysis of a very large corpus of conversational speech in a variety of real-life everyday contexts. It shows that because variation in voice quality plays a significant part in the transmission of interpersonal and affect-related social inform...

متن کامل

Semi-Supervised Model Training for Unbounded Conversational Speech Recognition

For conversational large-vocabulary continuous speech recognition (LVCSR) tasks, up to about two thousand hours of audio is commonly used to train state of the art models. Collection of labeled conversational audio however, is prohibitively expensive, laborious and error-prone. Furthermore, academic corpora like Fisher English (2004) or Switchboard (1992) are inadequate to train models with suf...

متن کامل

The fifth 'CHiME' Speech Separation and Recognition Challenge: Dataset, task and baselines

The CHiME challenge series aims to advance robust automatic speech recognition (ASR) technology by promoting research at the interface of speech and language processing, signal processing, and machine learning. This paper introduces the 5th CHiME Challenge, which considers the task of distant multimicrophone conversational ASR in real home environments. Speech material was elicited using a dinn...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2002

Research Methodologies, Observations and Outcomes in (Conversational) Speech Data Collection

نویسندگان

چکیده

منابع مشابه

From switchboard to fisher: telephone collection protocols, their uses and yields

Ethnomethodology and Conversational Analysis

Towards conversational speech synthesis; lessons learned from the expressive speech processing project

Semi-Supervised Model Training for Unbounded Conversational Speech Recognition

The fifth 'CHiME' Speech Separation and Recognition Challenge: Dataset, task and baselines

عنوان ژورنال:

اشتراک گذاری